{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# $$LightGBM\\ Feature\\ Importance\\ Tutorial$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Original source of the notebook: https://github.com/catboost/tutorials/blob/master/model_analysis/feature_importance_tutorial.ipynb*\n",
    "**Credits to the creators of the original notebook.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Weights and Biases"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "# Install W&B client\n",
    "git clone https://github.com/neomatrix369/client\n",
    "cd client\n",
    "git checkout add_lightgbm_plot_feature_importances_functionality\n",
    "\n",
    "python setup.py install\n",
    "\n",
    "# when the functionality is released on PyPI you would do the below:\n",
    "#!pip install wandb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Go to https://www.wandb.com/, create an account or login if you already have an account\n",
    "# Create your project and ensure you have obtained a W&B Token (after logging in and creating your project)\n",
    "\n",
    "!wandb login ${WANDB_TOKEN}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import wandb\n",
    "from wandb.lightgbm import plot_feature_importances, wandb_callback"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "                Logging results to <a href=\"https://wandb.com\" target=\"_blank\">Weights & Biases</a> <a href=\"https://docs.wandb.com/integrations/jupyter.html\" target=\"_blank\">(Documentation)</a>.<br/>\n",
       "                Project page: <a href=\"https://app.wandb.ai/neomatrix369/lightgbm_plot_feature_importances\" target=\"_blank\">https://app.wandb.ai/neomatrix369/lightgbm_plot_feature_importances</a><br/>\n",
       "                Run page: <a href=\"https://app.wandb.ai/neomatrix369/lightgbm_plot_feature_importances/runs/1a2s5c54\" target=\"_blank\">https://app.wandb.ai/neomatrix369/lightgbm_plot_feature_importances/runs/1a2s5c54</a><br/>\n",
       "            "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "W&B Run: https://app.wandb.ai/neomatrix369/lightgbm_plot_feature_importances/runs/1a2s5c54"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Here you would enter the name of your project created in the above step\n",
    "\n",
    "wandb.init(project='lightgbm_plot_feature_importances')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### LightGBM Feature Importance"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Sometimes it is very important to understand which feature made the greatest contribution to the final result. To do this, the LightGBM model has a feature_importance() method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import lightgbm as lgb\n",
    "from sklearn.datasets import make_classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### First, let's prepare the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "X, y = make_classification(n_samples=10000, n_features=10, n_informative=2, n_redundant=5, random_state=42)\n",
    "\n",
    "dataset = lgb.Dataset(X, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "ten_features = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Let's train LightGBM:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/satyasai/miniconda3/lib/python3.6/site-packages/lightgbm/engine.py:155: UserWarning: Found `early_stopping_rounds` in params. Will use it instead of argument\n",
      "  warnings.warn(\"Found `{}` in params. Will use it instead of argument\".format(alias))\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1]\ttrain's rmse: 0.494761\n",
      "Training until validation scores don't improve for 200 rounds\n",
      "[2]\ttrain's rmse: 0.489477\n",
      "[3]\ttrain's rmse: 0.484513\n",
      "[4]\ttrain's rmse: 0.479594\n",
      "[5]\ttrain's rmse: 0.474535\n",
      "[6]\ttrain's rmse: 0.469726\n",
      "[7]\ttrain's rmse: 0.465129\n",
      "[8]\ttrain's rmse: 0.460319\n",
      "[9]\ttrain's rmse: 0.455888\n",
      "[10]\ttrain's rmse: 0.45157\n",
      "[11]\ttrain's rmse: 0.447242\n",
      "[12]\ttrain's rmse: 0.443106\n",
      "[13]\ttrain's rmse: 0.438825\n",
      "[14]\ttrain's rmse: 0.434828\n",
      "[15]\ttrain's rmse: 0.430664\n",
      "[16]\ttrain's rmse: 0.426871\n",
      "[17]\ttrain's rmse: 0.422981\n",
      "[18]\ttrain's rmse: 0.419085\n",
      "[19]\ttrain's rmse: 0.415286\n",
      "[20]\ttrain's rmse: 0.41162\n",
      "[21]\ttrain's rmse: 0.408093\n",
      "[22]\ttrain's rmse: 0.404503\n",
      "[23]\ttrain's rmse: 0.400969\n",
      "[24]\ttrain's rmse: 0.397703\n",
      "[25]\ttrain's rmse: 0.3946\n",
      "[26]\ttrain's rmse: 0.391362\n",
      "[27]\ttrain's rmse: 0.388256\n",
      "[28]\ttrain's rmse: 0.385211\n",
      "[29]\ttrain's rmse: 0.382057\n",
      "[30]\ttrain's rmse: 0.379212\n",
      "[31]\ttrain's rmse: 0.37634\n",
      "[32]\ttrain's rmse: 0.373503\n",
      "[33]\ttrain's rmse: 0.370766\n",
      "[34]\ttrain's rmse: 0.36803\n",
      "[35]\ttrain's rmse: 0.365489\n",
      "[36]\ttrain's rmse: 0.362911\n",
      "[37]\ttrain's rmse: 0.360333\n",
      "[38]\ttrain's rmse: 0.357851\n",
      "[39]\ttrain's rmse: 0.355366\n",
      "[40]\ttrain's rmse: 0.35288\n",
      "[41]\ttrain's rmse: 0.350648\n",
      "[42]\ttrain's rmse: 0.34835\n",
      "[43]\ttrain's rmse: 0.3461\n",
      "[44]\ttrain's rmse: 0.34384\n",
      "[45]\ttrain's rmse: 0.341601\n",
      "[46]\ttrain's rmse: 0.339483\n",
      "[47]\ttrain's rmse: 0.337527\n",
      "[48]\ttrain's rmse: 0.335625\n",
      "[49]\ttrain's rmse: 0.333558\n",
      "[50]\ttrain's rmse: 0.331604\n",
      "[51]\ttrain's rmse: 0.32975\n",
      "[52]\ttrain's rmse: 0.327857\n",
      "[53]\ttrain's rmse: 0.326151\n",
      "[54]\ttrain's rmse: 0.32436\n",
      "[55]\ttrain's rmse: 0.322637\n",
      "[56]\ttrain's rmse: 0.320999\n",
      "[57]\ttrain's rmse: 0.319348\n",
      "[58]\ttrain's rmse: 0.317589\n",
      "[59]\ttrain's rmse: 0.315932\n",
      "[60]\ttrain's rmse: 0.314279\n",
      "[61]\ttrain's rmse: 0.312762\n",
      "[62]\ttrain's rmse: 0.311186\n",
      "[63]\ttrain's rmse: 0.309605\n",
      "[64]\ttrain's rmse: 0.308211\n",
      "[65]\ttrain's rmse: 0.306764\n",
      "[66]\ttrain's rmse: 0.305417\n",
      "[67]\ttrain's rmse: 0.304028\n",
      "[68]\ttrain's rmse: 0.302588\n",
      "[69]\ttrain's rmse: 0.301153\n",
      "[70]\ttrain's rmse: 0.299797\n",
      "[71]\ttrain's rmse: 0.298462\n",
      "[72]\ttrain's rmse: 0.297229\n",
      "[73]\ttrain's rmse: 0.296037\n",
      "[74]\ttrain's rmse: 0.29479\n",
      "[75]\ttrain's rmse: 0.293563\n",
      "[76]\ttrain's rmse: 0.292385\n",
      "[77]\ttrain's rmse: 0.291237\n",
      "[78]\ttrain's rmse: 0.290194\n",
      "[79]\ttrain's rmse: 0.28911\n",
      "[80]\ttrain's rmse: 0.288041\n",
      "[81]\ttrain's rmse: 0.287008\n",
      "[82]\ttrain's rmse: 0.286027\n",
      "[83]\ttrain's rmse: 0.285072\n",
      "[84]\ttrain's rmse: 0.284033\n",
      "[85]\ttrain's rmse: 0.28307\n",
      "[86]\ttrain's rmse: 0.282138\n",
      "[87]\ttrain's rmse: 0.281244\n",
      "[88]\ttrain's rmse: 0.280416\n",
      "[89]\ttrain's rmse: 0.27952\n",
      "[90]\ttrain's rmse: 0.278676\n",
      "[91]\ttrain's rmse: 0.277884\n",
      "[92]\ttrain's rmse: 0.27705\n",
      "[93]\ttrain's rmse: 0.276243\n",
      "[94]\ttrain's rmse: 0.275474\n",
      "[95]\ttrain's rmse: 0.274689\n",
      "[96]\ttrain's rmse: 0.273964\n",
      "[97]\ttrain's rmse: 0.273188\n",
      "[98]\ttrain's rmse: 0.272416\n",
      "[99]\ttrain's rmse: 0.271641\n",
      "[100]\ttrain's rmse: 0.270869\n",
      "Did not meet early stopping. Best iteration is:\n",
      "[100]\ttrain's rmse: 0.270869\n",
      "CPU times: user 8.29 s, sys: 111 ms, total: 8.4 s\n",
      "Wall time: 1.95 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "params = {'learning_rate': 0.01, 'max_depth': -1, 'num_leaves': 4, 'objective': 'fair', 'boosting': 'gbdt',\n",
    "              'boost_from_average': True, 'feature_fraction': 0.9, 'bagging_freq': 1, 'bagging_fraction': 0.5,\n",
    "              'early_stopping_rounds': 200, 'metric': 'rmse', 'max_bin': 255, 'n_jobs': -1, 'verbosity': -1,\n",
    "              'bagging_seed': 1234}\n",
    "model = lgb.train(params, dataset, valid_sets=[dataset], valid_names=['train'], callbacks=[wandb_callback()])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### W&B Feature Importance visualisation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Go to your dashboard to see the LightGBM's Feature Importances graph plotted.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<wandb.viz.Visualize at 0x7ff4e0804e48>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "plot_feature_importances(model, feature_names=ten_features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## You should see a nice bar chart with the title Feature importances after training with the LightGBM model, in your W&B Dashboard on https://app.wandb.ai/[your_username]/[your_project_name]/runs/[your_run_id]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
